Address decoding circuit performing a multi-bit shift operation in a single clock cycle

ABSTRACT

A decoder is disclosed that is used to select an area of address space in an Integrated Circuit. The decoder uses a hardware shifting module that performs shift operations on constants. Such a structure reduces an overall area consumption of the shifting module. Additionally, the decoder can perform a multi-bit shift operation in a single clock cycle.

CROSS-REFERENCE TO RELATED APPLICATION

This is a continuation of U.S. application Ser. No. 15/468,704, filedMar. 24, 2017, which is a continuation of U.S. application Ser. No.15/083,077, filed Mar. 28, 2016 (now U.S. Pat. No. 9,607,682), whichapplications are incorporated herein by reference in their entirety.

BACKGROUND

A System on Chip (SoC) is a type of integrated circuit (IC) thatcombines many or all components of a computer into a single chip.Traditionally, SoCs partition the address space into different regions(DRAM regions, internal chip addresses, bus addresses, etc.) Eachtransaction in the internal SoC fabric is address-decoded in order toidentify the required destination, and the transaction is routedaccordingly. Typically, an address decoder can be used that has addressbits as inputs and selection lines of the destinations as outputs. Forexample, the SoC can include multiple master/slave combinations and eachmaster can include a separate decoder that selects the appropriate slaveusing the selection lines.

Address decoding typically uses inequality comparators, wherein twoinequality comparators are used per slave window. Largely, due to theinequality comparators, decoders are very area expensive. And, due tothe large number of slaves, decoder inefficiency has become problematicoverall in terms of area usage in a SoC or other integrated circuit.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system diagram showing a decoder hardware design forselecting one of a plurality of memory regions based on an inputtransaction address; the decoder hardware design is area optimized anddoes not use inequality comparators.

FIG. 2 shows an integrated circuit in which the decoder hardware of FIG.1 can be used.

FIG. 3 shows a particular embodiment of decoder hardware for selectingone of a plurality of memory regions based on an input transactionaddress.

FIG. 4 shows an embodiment of shifting hardware that can be used in thedecoder hardware.

FIG. 5 is a flowchart of a method for decoding a transaction addressaccording to one embodiment.

FIG. 6 is a flowchart of a method for decoding a transaction addresswith additional process blocks that can be used with the FIG. 5embodiment.

FIG. 7 depicts a generalized example of a suitable computing environmentin which the described innovations may be implemented.

DETAILED DESCRIPTION

Decoders that include inequality comparators are area expensive. Shiftoperations on input variables in a SoC or other integrated circuit canalso be expensive in terms of area. However, when performing shiftoperations on constants, area consumption is comparatively less. Thus,it is desirable to provide a decoder that performs a multi-bit shiftoperation in one clock cycle and the shifting hardware operates on aconstant. The single-shift operation and the shifting hardware thatoperates on the constant, results in reduced area that is consumed bythe shifting hardware.

FIG. 1 shows a decoder 102 that receives a transaction address 110 andother inputs 111 (described further below) and that provides a selectionsignal 112 corresponding to which region of an address space 120 hasbeen selected. The transaction address 110 typically has a first portionused to generate the selection signal 112 and a second portion thatrepresents an offset into the selected memory region, wherein the sizingof the first and second portions changes for each region of the addressspace. The address space 120 defines a range of discrete addresses thatcan correspond to a memory cell, a peripheral device, a disk sector, anI/O device or other logical or physical entity. Each region has acorresponding base address and window size. For example, a DRAM highregion, shown at 122, has a base address of 01_0000_0000 and a windowsize of 4 Gigabytes so that the next region of memory starts at02_0000_0000. Although seven different regions are shown for the addressspace 120, any number of regions can be used. The illustrated regionsinclude 1) DRAM high, reclaim address space; 2) DRAM high; 3) a bootsector; 4) hardware engine 1; 5) hardware engine 2; 6) PCIe busaddresses; and 7) DRAM low. The base addresses for each region are shownat 124, and the addresses chosen are specific to the particularimplementation. In the illustrated embodiment, the address space 120 isan operating system view of memory in an integrated circuit in which thedecoder 102 sits. However, different configurations of the address spacecan be used.

Details of the decoder 102 are shown in the decoder logic 130. Eachregion of the address space 120 has similar decoder logic 130 associatedtherewith. For example, if the illustrated decoder logic 130 isassociated with the DRAM high region 122, then an output hit/miss output134 indicates a hit when the transaction address 110 is pointing withinthat region, otherwise it indicates a miss. The decoder logic 130includes shifting hardware 140, XOR or XNOR logic 142, and combinatoriallogic 144. The shifting hardware 140 is a bitwise operation that shiftsall bits of an operand either right or left. The shifting hardware 140can shift by any number of bit positions in a single clock cycle and itis the only shifting operation needed by the decoder 102. Thus, theshifting hardware 140 can shift by 1 bit, 2 bits, or N bits (where N islimited by the size of the shifting hardware). The shifting hardware 140uses a constant value 150 input as the operand. The constant value 150can be a bitwise logical 1 (e.g., 11111 . . . 1) and the shiftinghardware 140 can shift left. A logical 0 is shifted in from the right aslogical 1's are shifted out. The other inputs 111 include a window sizesignal 152 and a base address of the slave region 160. The number of bitpositions that the shifting hardware shifts is determined by the windowsize signal 152, which is associated with a window size of the region.Continuing with the above example, the window size can be associatedwith the size of the DRAM high region 122. An output of the shiftinghardware 140 is a mask signal 154. As further described below, theshifting hardware 140 only performs a single shift operation in oneclock cycle so that the decoder logic 130 can generate a hit/miss output134 in one clock cycle. The base address of the slave region 160 and thetransaction address 110 are input into the XOR or XNOR logic 142. Anexample base address is 01_0000_0000 for the DRAM high region 122. Theoutput of XOR/XNOR logic 142 is combined with mask signal 154 in thecombinatorial logic 144. In one example, the combinatorial logic 144 canbe a combination of an AND gate and an OR gate. Other logic combinationscan be used. The resultant output 134 indicates whether the transactionaddress 110 is within the associated region of address space. Forexample, if the transaction address is between 01_0000_0000 and02_0000_0000, and the decoder logic 130 is associated with the DRAM highregion 122, then the hit/miss output 134 indicates a hit. Otherwise, amiss is indicated.

The indicated address space 120 can have seven different decoders 130that each receive the transaction address 110 in parallel. Each decoderis assigned to one of the regions in the address space 120. Thus, foreach transaction address, one of the decoders indicates a hit andselects the corresponding region of the address space. A simple exampleis shown below for a decoder similar to decoder 102 of FIG. 1, but forthree different address regions called slave 0, slave 1, and slave 2.Three different decoder configurations are aligned in parallel and eachreceives the transaction address, which in this example is 10010. Abitwise comparison of the base address and transaction address isperformed by an XOR gate in each decoder. The Tmp mask field is aconstant input of 11111 and acts as the operand into the shiftinghardware. The shift left mask is generated by shifting the operand 3bits, corresponding to a window size of 8 (2³=8) for slave 0, 2 bits forslave 1, and 3 bits for slave 2. Thus, the following formula can be usedto determine the number of bits that the shifting hardware is shifted:

2^(Shifting select)=window size. The shifting select exponent is arepresentation of a number of addresses in the region of address space.Notably, different window sizes can be used for the different regions ofthe address space.

The following table shows the operations for the decoders:

Base Addr Trans XOR between Trans Tmp Shift [binary] Addr and Base AddrMask Left Mask Slave 0 00000 10010 10010 11111 11000 Slave 1 01000 1001011010 11111 11100 Slave 2 10000 10010 00010 11111 11000

As shown below, the shift-left mask is applied to output of the XORgate. More particularly, a bitwise AND operation is performed on theshift-left mask and the output of the XOR gate. The result column can beconsidered an intermediate result or a final result. A result that isall 0's for Slave 2 is indicative of a hit.

XOR between Trans Shift and Base Addr Left Mask Result Slave 0 1001011000 10000 Slave 1 11010 11100 11000 Slave 2 00010 11000 00000

The intermediate result can be converted to a single bit by passing theresult through an OR gate and invertor to obtain a final result, asshown below. As indicated, the transaction address results in selectionof an address region corresponding to slave 2.

Result Collapse or not Slave 0 10000 1 0 Slave 1 11000 1 0 Slave 2 000000 1

FIG. 2 illustrates an embodiment of an Integrated Circuit (IC) 200having a master/slave configuration with multiple master modules 210,212, 214 coupled in parallel. As indicated by dots 216, any number ofmaster modules can be used. The master modules can include any desiredhardware, but each includes a decoder 220, 222, and 224, respectively.The decoders 220, 222, and 224 can be equivalent hardware except for abase address that is applied to each decoder and a window sizeparameter. The decoders 220, 222, 224 can be coupled to a bus that isalso coupled to multiple slaves 230, 232, 234, and 236. One of theslaves is enabled in response to the decoders 220, 222, and 224selection signals. The transaction address offset is then used to access(read from or write to) the selected slave address. The decoders 220,222, 224 can be similar to the decoder logic 130 of FIG. 1.

In an alternative embodiment, each decoder 220, 222, and 224 can becoupled to one slave using a select line (instead of the bus). Theselect line can be pulled inactive and become active only upon a hitsignal from the decoder. In this way, the slave can be enabled and theoffset of the transaction address can be used to read from or write tothe selected address corresponding to the transaction address.

FIG. 3 shows a detailed view of the decoders 220 and 224 of FIG. 2(decoder 222 has a similar structure.) In decoder 220, a window size forslave 0 can be input into an optional −1 operation shown at 310. In someembodiments, it is desirable to subtract 1 from the size parameter.However, the −1 operation can be removed, if desired, as indicated bythe dashed lines of box 310. The window size (or the window size −1) isinput into shifting hardware 320 as a select input that selects a numberof bits that an input operand is shifted. Typically, the window size isan exponent of base 2 that can be used to compute a total number ofaddresses within an address region. The input operand is shown as aconstant 11111 . . . 11 that can be supplied from a multibit bus. Otherconstant values can be used. The number of bits of the constant shouldbe large enough to support a maximum select input of the shiftinghardware 320. The output of the shifting hardware 320 is a mask output.In parallel with the shifting hardware 320, a first hardware logic 330receives a base address for slave 0 and the transaction address. Thefirst hardware logic 330 can be a bitwise logical XOR gate that performsa bit-by-bit comparison of the base address for slave 0 and thetransaction address. The output of the first hardware logic 330 is inputinto a second hardware logic 340, which logically combines the maskoutput from the shifting hardware 320 and the first hardware logicoutput. The second hardware logic 340 can include a bitwise logical ANDgate that combines both inputs and produces an output that is indicativeof whether the transaction address is within the region of address space(i.e., the slave 0 region). A logical NOR gate 350 can combine (alsocalled collapse) all of the bits to produce a single-bit hit/missoutput. The NOR gate 350 can be replaced with an OR gate, an AND gate ora NAND gate. The hit/miss output is also indicative of whether thetransaction address is within the slave 0 region of address space.

The decoder 224 can have a similar structure to decoder 220. However,the window size and the base address are for slave N (wherein N is anynumber). The transaction address is the same transaction addresssupplied to each decoder in parallel, and only one of the decoderstypically has a hit, while the others have a miss. The decoders 220, 224notably use a shift left of a constant operand by a number of bitsdictated by a window size. Such a structure provides a considerable areaadvantage over shift registers that do not operate on a constant.Likewise, the above structure uses considerably less area than decodersthat use inequality comparators.

FIG. 4 is an example of shifting hardware 400 that can be used in any ofthe embodiments described herein. The shifting hardware 400 can includea plurality of multiplexers 410 coupled in series. Specifically,multiplexers 420, 430 and 440 are coupled in series with different bitsof a multiple bit select line controlling the multiplexers. The selectlines shown as sel[0], sel[1], and sel[2] can be constant voltages basedon a window size of the slave device. The inputs to the multiplexers areeither constants, an output of a previous multiplexer, or both. Thenumber of multiplexers needed is greatly reduced (compared to otherdesigns) by using the constant inputs and by using a shift leftoperation.

FIG. 5 is a flowchart of a method for decoding. In process block 510, anaddress space is divided into multiple regions. Each region includes abase address and a size (also called a window size) that is associatedwith a number of addresses in the region. Each address can be any numberof bytes, such as 1, 2, 4, 8, or 16 bytes. In process block 520, atransaction address is received that is to be decoded. The transactionaddress is associated with one of the addresses within the differentregions and the decoder determines which region. In process block 530,the transaction address is logically combined with a base address of oneof the regions of interest to produce a first result. The logicallycombining can be accomplished using an XOR gate or an XNOR gate. Suchlogical combining can be a bitwise comparison of the two inputs. Inprocess block 540, a shift left operation can be used wherein a numberof bit positions shifted is controlled by a size of the region ofaddress space for which the decoder is associated. The shift-leftoperation can be the only shifting operation needed to decode anaddress. Additionally, the shifting can occur in one clock cycle. Anoperand of the shifting hardware can be a constant and the shiftinghardware can shift in logical 0's as the constant is shifted left. Anoutput of the logical combining can be a mask value. In process block550, the mask value can be applied to the first result to obtain asecond result that indicates whether the transaction address is withinthe region of address space associated with the decoder.

FIG. 6 is a flowchart according to a method that can be used to decodean incoming address. The process blocks of FIG. 6 can be additionalprocess blocks to FIG. 5. In process block 610, a shift operation, suchas the one described in FIG. 5, is performed in one clock cycle using asingle shift-left operation. Additionally, a constant can be used as theoperand being shifted. Still further, during the decoding process of thetransaction address, the shift-left operation can be the only shiftperformed. The shifting hardware performing the shift operation caninclude a plurality of multiplexers coupled in series with select linesof the multiplexers coupled to an input that corresponds to a size ofthe corresponding region. In process block 620, all of the bitsresulting from process block 550 in FIG. 5 can be logically combined toproduce a single, one-bit hit/miss result. The logical combining can beaccomplished using an OR gate or a NOR gate. The output of the logicalcombining can be a single bit indicative of a hit or miss.

FIG. 7 depicts a generalized example of a suitable computing environment700 in which the described innovations may be implemented. For example,an integrated circuit herein can be positioned within the computingenvironment. The computing environment 700 is not intended to suggestany limitation as to scope of use or functionality, as the innovationsmay be implemented in diverse general-purpose or special-purposecomputing systems. For example, the computing environment 700 can be anyof a variety of computing devices (e.g., desktop computer, laptopcomputer, server computer, tablet computer, etc.)

With reference to FIG. 7, the computing environment 700 includes one ormore processing units 710, 715 and memory 720. The computing environmentcan also include an IC 725 that incorporates the decoder technologydescribed herein. In FIG. 7, this basic configuration 730 is includedwithin a dashed line. The processing units 710 and 715 executecomputer-executable instructions. A processing unit can be ageneral-purpose central processing unit (CPU), processor in anapplication-specific integrated circuit (ASIC) or any other type ofprocessor. In a multi-processing system, multiple processing unitsexecute computer-executable instructions to increase processing power.For example, FIG. 7 shows a central processing unit 710 as well as agraphics processing unit or co-processing unit 715. The tangible memory720 may be volatile memory (e.g., registers, cache, RAM), non-volatilememory (e.g., ROM, EEPROM, flash memory, etc.), or some combination ofthe two, accessible by the processing unit(s). The memory 720 storessoftware 780 implementing one or more innovations described herein, inthe form of computer-executable instructions suitable for execution bythe processing unit(s). The software 780 can also be integrated into theIC.

A computing system may have additional features. For example, thecomputing environment 700 includes storage 740, one or more inputdevices 750, one or more output devices 760, and one or morecommunication connections 770. An interconnection mechanism (not shown)such as a bus, controller, or network interconnects the components ofthe computing environment 700. Typically, operating system software (notshown) provides an operating environment for other software executing inthe computing environment 700, and coordinates activities of thecomponents of the computing environment 700.

The tangible storage 740 may be removable or non-removable, and includesmagnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any othermedium which can be used to store information in a non-transitory wayand which can be accessed within the computing environment 700. Thestorage 740 stores instructions for the software 780 implementing one ormore innovations described herein.

The input device(s) 750 may be a touch input device such as a keyboard,mouse, pen, or trackball, a voice input device, a scanning device, oranother device that provides input to the computing environment 700. Theoutput device(s) 760 may be a display, printer, speaker, CD-writer, oranother device that provides output from the computing environment 700.

The communication connection(s) 770 enable communication over acommunication medium to another computing entity. The communicationmedium conveys information such as computer-executable instructions,audio or video input or output, or other data in a modulated datasignal. A modulated data signal is a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia can use an electrical, optical, RF, or other carrier.

Although the operations of some of the disclosed methods are describedin a particular, sequential order for convenient presentation, it shouldbe understood that this manner of description encompasses rearrangement,unless a particular ordering is required by specific language set forthbelow. For example, operations described sequentially may in some casesbe rearranged or performed concurrently. Moreover, for the sake ofsimplicity, the attached figures may not show the various ways in whichthe disclosed methods can be used in conjunction with other methods.

Any of the disclosed methods can be implemented as computer-executableinstructions stored on one or more computer-readable storage media(e.g., one or more optical media discs, volatile memory components (suchas DRAM or SRAM), or non-volatile memory components (such as flashmemory or hard drives)) and executed on a computer (e.g., anycommercially available computer, including smart phones or other mobiledevices that include computing hardware). The term computer-readablestorage media does not include communication connections, such assignals and carrier waves. Any of the computer-executable instructionsfor implementing the disclosed techniques as well as any data createdand used during implementation of the disclosed embodiments can bestored on one or more computer-readable storage media. Thecomputer-executable instructions can be part of, for example, adedicated software application or a software application that isaccessed or downloaded via a web browser or other software application(such as a remote computing application). Such software can be executed,for example, on a single local computer (e.g., any suitable commerciallyavailable computer) or in a network environment (e.g., via the Internet,a wide-area network, a local-area network, a client-server network (suchas a cloud computing network), or other such network) using one or morenetwork computers.

For clarity, only certain selected aspects of the software or HDL-basedimplementations are described. Other details that are well known in theart are omitted. For example, it should be understood that the disclosedtechnology is not limited to any specific computer language or program.For instance, the disclosed technology can be implemented by softwarewritten in C++, Java, and alike, as well as any hardware descriptionlanguage (e.g., Verilog, VHDL). Likewise, the disclosed technology isnot limited to any particular computer or type of hardware. Certaindetails of suitable computers and hardware are well known and need notbe set forth in detail in this disclosure.

It should also be well understood that any functionality describedherein can be performed, at least in part, by one or more hardware logiccomponents, instead of software. For example, and without limitation,illustrative types of hardware logic components that can be used includeField-programmable Gate Arrays (FPGAs), Application-specific IntegratedCircuits (ASICs), Program-specific Standard Products (PSSPs),System-on-a-chip systems (SoCs), Complex Programmable Logic Devices(CPLDs), Central Processing Units (CPUs), etc.

Furthermore, any of the software-based embodiments (comprising, forexample, computer-executable instructions for causing a computer toperform any of the disclosed methods) can be uploaded, downloaded, orremotely accessed through a suitable communication means. Such suitablecommunication means include, for example, the Internet, the World WideWeb, an intranet, software applications, cable (including fiber opticcable), magnetic communications, electromagnetic communications(including RF, microwave, and infrared communications), electroniccommunications, or other such communication means.

The disclosed methods, apparatus, and systems should not be construed aslimiting in any way. Instead, the present disclosure is directed towardall novel and nonobvious features and aspects of the various disclosedembodiments, alone and in various combinations and subcombinations withone another. The disclosed methods, apparatus, and systems are notlimited to any specific aspect or feature or combination thereof, nor dothe disclosed embodiments require that any one or more specificadvantages be present or problems be solved.

In view of the many possible embodiments to which the principles of thedisclosed invention may be applied, it should be recognized that theillustrated embodiments are only preferred examples of the invention andshould not be taken as limiting the scope of the invention. Rather, thescope of the invention is defined by the following claims. We thereforeclaim as our invention all that comes within the scope of these claims.

What is claimed is:
 1. An address decoder within an Integrated Circuit(IC) to access an address space, comprising: shifting hardware forreceiving a constant first input and a second input associated with awindow size of an address region within the address space, the shiftinghardware configured to perform a multi-bit shift operation on theconstant in one clock cycle to generate a mask signal on mask signallines, wherein the multi-bit shift operation is configured to shift anumber of bit positions in the one clock cycle based on the second inputassociated with the window size of the address region; XOR-based logichaving a first input coupled to receive a base address of the addressregion and a second input coupled to receive a transaction addresswithin the address region; combinatorial logic coupled to the masksignal lines and coupled to an output of the XOR-based logic, wherein anoutput of the combinatorial logic is configured as an indicator thatidentifies if the transaction address is within the address region ofthe address space.
 2. The address decoder of claim 1, wherein theconstant is a plurality of bits set to a logical high value and whereinthe multi-bit shift operation includes shifting left by the number ofbit positions.
 3. The address decoder of claim 1, wherein the XOR-basedlogic includes a combination of XOR gates or XNOR gates.
 4. The addressdecoder of claim 1, wherein the address region is a first address regionwithin the address space and the address space includes a plurality ofaddress regions of different sizes.
 5. The address decoder of claim 4,wherein each of the plurality of address regions has a unique baseaddress and a window size that is independent of other window sizes. 6.The address decoder of claim 1, wherein the indicator is configured tobe used to read or write a memory address within the address region. 7.The address decoder of claim 1, wherein the shifting hardware includes aplurality of multiplexers coupled in series wherein at least one of theinputs on each multiplexer is the constant, and wherein the constant isa logic high voltage level.
 8. A method for address decoding,comprising: receiving a transaction address associated with a memoryaddress within an address space, wherein the address space includesmultiple address regions; combining the transaction address with a baseaddress to form a combined address, the base address being an address ata start of a first address region of the multiple address regions, andwherein the transaction address is an offset into the first addressregion; generating a mask using shifting hardware that performs amulti-bit shift operation based on a size of the first address region;and combining the mask with the combined address to generate a selectionsignal used to write or read the memory address within the first addressregion.
 9. The method of claim 8, wherein shifting hardware includes aconstant first input and a second input associated with the size of thefirst address region within the address space.
 10. The method of claim9, wherein the shifting hardware performs the multi-bit shift operationon the constant in one clock cycle to generate the mask, wherein anumber of bit positions to shift in the multi-bit shift operation isdependent upon the size of the first address region.
 11. The method ofclaim 8, wherein the combining of the base address with the transactionaddress includes using XOR or XNOR gates to generate the combinedaddress.
 12. The method of claim 8, wherein the selection signalidentifies if the transaction address is within the address region ofthe address space.
 13. The method of claim 8, wherein the generating ofthe selection signal occurs in an address decoder and wherein thetransaction address is received by a plurality of address decoders inparallel.
 14. The method of claim 8, wherein a number of bit positionsshifted by the shifting hardware corresponds to the size of the firstaddress region.
 15. The method of claim 8, wherein the combining of themask and the combined address is accomplished using a logical ANDoperator.
 16. The method of claim 8, wherein the multi-bit shiftoperation is performed by a single shift-left module that receives aconstant logical 1 input, and the shift-left module shifts in a logical0.
 17. The method of claim 8, wherein the shifting hardware includesmultiplexers coupled in series with select lines of the multiplexerscoupled to an input that corresponds to a size of the region of theaddress space.
 18. An integrated circuit, comprising: shifting hardwareincluding a plurality of multiplexers coupled in series, wherein eachmultiplexer includes a select line that is associated with a size of anaddress region of an address space and at least one input coupled to afixed voltage signal line, the shifting hardware for generating a masksignal on mask signal lines, which is an output of the shiftinghardware; a XOR/XNOR logic coupled to receive a base address as oneinput and coupled to receive a transaction address on a second input,the XOR/XNOR logic to generate an output associated with whether thetransaction address is within the address region; and combinatoriallogic coupled to the mask signal lines and coupled to an output of theXOR/XNOR logic, wherein an output of the combinatorial logic is anindicator that identifies if the transaction address is within theaddress region of the address space.
 19. The integrated circuit of claim15, wherein the address region is a first address region in an addressspace including multiple address regions and the transaction address issupplied to multiple copies of the shifting hardware in parallel. 20.The integrated circuit of claim 18, wherein the shifting hardware isconfigured to generate the mask in one clock cycle.